Entry Name:  Purdue-Chae-GC

VAST 2015 Challenge
Grand Challenge

 

 

Team Members:

Junghoon Chae, Purdue University, jchae@purdue.edu     PRIMARY
Guizhen Wang, Purdue University, wang1908@purdue.edu
Benjamin Ahlbrand, Purdue University, bahlbran@purdue.edu
Mahesh Babu Gorantla, Purdue University, mgorantl@purdue.edu
Jiawei Zhang, Purdue University, zhan1486@purdue.edu
Siqiao Chen, Purdue University, chen1722@purdue.edu
Hanye Xu, Purdue University, xu193@purdue.edu
Jieqiong Zhao, Purdue University, zhao413@purdue.edu
William Hatton, United States Air Force Academy, C16william.hatton@usafa.edu
Abish Malik, Purdue University, amalik@purdue.edu
Sungahn Ko, Purdue University, ko@purdue.edu
David S. Ebert, Purdue University, ebertd@purdue.edu


Student Team:  NO

 

Analytic Tools Used:

Our customized visual analytics tool DinofunVis

Tableau

MS Excel

R

Gephi

We also applied our algorithms for clustering. Please refer to the Appendix at the bottom of this document.

 

Approximately how many hours were spent working on this submission in total?

100 hours.

 

May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2015 is complete? YES

 

 

Video Download

Video:

http://pixel.ecn.purdue.edu:8080/~zhan1486/VASTCHALLENGE15/GC.wmv

 

 

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Questions

 

For each of the following questions, consider both the movement and communications data.

GC.1Scott is not a paying customer and does not have an ID. Describe Scott Jones’ activities in the park during the three-day weekend. Who does he spend most of his time with? When does he arrive? When does he leave? What route does he follow?

Limit your response to no more than 10 images and 1000 words.


Figure 1-1. Heatmaps and line graphs based on check-in data.



Figure 1-2. Each figure (top) shows major movements between 2:00 PM and 3:00 PM on each of the three days. One of major flows toward the event stage, except for Sunday.
Each figure (bottom) presents how visitors move out of the stage after the show.


To begin our search for Scott Jones, we assumed that he attracts a lot of attention from the crowd. We also assumed his performances generate a great deal of interest. There were two possible places Scott could perform at the park: at a theatre or a stage. From the heatmap and line graphs from check-in frequencies for each attraction (Figure 1-1), we observe the performance stage (#63) became popular at several times throughout the weekend, with people frequently checking-in at 9:00 AM and 2:00 PM. Furthermore, the flow visualization of visitors’ movement in Figure 1-2 supports our hypothesis, where thick arrows indicate how people are moving in before the show began (Figure 1-2 (Top)), and out of the stage area after the show finished (Figure 1-2 (Bottom)) on Friday and Saturday, respectively.

Upon closer examination of the check-ins data (Figure 1-1), we find that a majority of check-ins occurred later in the hour at around 9:45 AM and 2:45 PM. These results bring us to the conclusion that the performances by Scott started at 10:00 AM and 3:00 PM, with people arriving a short time before to get seats.


Figure 1-3: Communication data from Coaster Alley region in three days.
Note the peaks (red boxes) in terms of the number of communications.


To determine when Scott’s performances ended, we noticed, in the communication data, the spike of messages sent from the Coaster Alley region an hour after each show started as shown in Figure 1-3. Since the stage is located in Coaster Alley, we believe that the people began communicating as soon as the show was over.

Thus, each show lasted for an hour, and we know that on Friday and Saturday, Scott was at the performance stage from at least 10:00 AM-11:00 AM and 3:00 PM-4:00 PM. On Sunday, Scott only performed the morning show. The information we gain from this knowledge is that Scott focused on his performance when he was in the park.

We assumed that when people saw Scott, they sent messages to each other. But we were unable to find any evidence for this hypothesis. The inquiry into anomalous movements of people in the park throughout the three days, on the other hand, led to interesting insights. While investigating the people who attended the park just to see Scott, we looked at the IDs of those people who only had one or two check-ins. Our hypothesis was that these people would not be visiting the attractions as they were likely going to the park just to see Scott. Our analysis gave us two IDs of people who traveled together and took a path we believe to revolve around Scott. IDs 521750 and 644885 were selected using our tool, and we then traced their movements around the park as shown in Figure 1-5.


Figure 1-5. The first row shows the movements of 644885 between 8:40 AM and 10AM.
The second row shows the major movements of people for the same time frames.


These two people arrived at the park at the right entrance and took a loop through the Entry Corridor and Tundra Land on their way down to the performance stage as shown in the first row of Figure 1-5. The second row in the Figure 1-5 shows the corresponding movement clusters of the park visitors for the same time periods. We find certain major movements of the public that correspond to the movement patterns of the two individuals; thereby, indicating that the public would flock to see Scott. The two individuals arrive at the performance stage a half an hour before Scott’s performance, but do not check-in. This seems strange because they appear to be focused on Scott, but do not actually get tickets to see his show. They wait outside the stage area for two hours (until about a half hour after Scott finishes the show), and then depart the park along the same path that they entered. So, it appears that they leave at 12:15 PM for lunch. They then return at 1:45 PM by taking the same path as that of the morning, and again fail to check in during the time of the second show. They do this every day, except for Sunday, when they leave the park for lunch in the afternoon and do not return (because the second performance of the day was cancelled). The fact that these people come to the park each day specifically for Scott’s performance, but do not actually enter the stage area indicates that they are not just fans of Scotts. It is likely that these two people are either security for Scott or his entourage/friends accompanying him on his path throughout the park each day. This is also confirmed with their time line because they show up for each show a half hour before and leave a half hour after. If they were travelling with Scott, it would make sense that he needs to get to his show relatively early to prepare and be ready. Also, he would need to wait a while after the show ends to leave so that he does not get mobbed by the large flux of visitors leaving the stage area.

Therefore, we conclude that these two IDs, whether security or simply acquaintances, follow Scott in the park the entire time he is there. So, we know Scott moves through the same path when he enters and leaves the park, both in the morning and afternoon. Scott is walking through the path shown in Figure 1-5 from 8:45 AM-9:30 AM, preparing and performing at the stage from 9:30 AM-11:30 AM, then walking back through the same path from 11:30 AM-12:15 PM when he leaves the park for lunch. He repeats the pattern with the same time frame when he returns to the park at 1:45 PM each day. Also, since we have determined the second performance on Sunday was cancelled, we also conclude Scott was notified of the crime and the show cancellation, so he did not return to the park in the afternoon. This is also confirmed as the two IDs that were following him in the park were absent on Sunday afternoon.

 

GC.2 – Identify up to 8 issues with park operations during the three-day weekend.  Provide a rationale for your answers.

               1.   The app the park provides is useful for both the visitors and the park administrators. One issue with the apps movement monitoring from the park administrators perspective is the difference between check-ins and movements. The check-in feature only applies currently to attractions and rides, and does not pertain to places for food, shopping, restrooms, or miscellaneous features. Thus, the classification of movement is not always true, and skews the data for movements to seem longer than they actually are. To better help clarify the type of data for park analysts, the app should record the time a person spends at one of these lesser features/attractions as a check-in so that any stop is indicated as such.

               2.   The park employees responsible for the pavilion made a couple of mistakes on Sunday. The pavilion was locked before Scott Jones’ performances, which meant the pavilion was closed each morning from 10:00 AM-11:00 AM (as can be seen in Figure 2-1).


Figure 2-1: No check-in allowed for the stage on Sunday from 10-11 AM.

On Sunday, however, the park employee responsible for the locking and unlocking procedure failed to unlock the pavilion until about 11:30 AM, giving the criminals another half hour to vandalize the area. Also, the employee locked the pavilion without checking the inside for any people remaining, because he left the three suspects (id: 416790, 461004, and 1502920) in there alone during the time it was locked.

Either the park employee was involved with the crime or he did not complete his job properly, depriving visitors of their time at the pavilion and providing the suspects with more time to commit their crime.

               3.   Several attractions had much longer check-in times likely due to a high popularity; thereby, creating a long line and forcing people to wait for large amounts of time. Our mapping tool shows several people spend large quantities of their time at certain rides. This can be observed from Figure 2-2 (Top, x-axis: attraction, y-axis: number of check-in). The attractions highlighted in this figure are thrill rides. Further, Figure 2-2 (Bottom) shows the average wait time for the thrill rides, where we find that the wait times for Attractions 4, 5, and 7 are exceedingly large.

Many people who attempt to enjoy a popular attraction can only experience that one ride in a whole hour of their time. Therefore, people trying to incorporate the most popular attractions into their day will struggle to have time remaining to see the entire park and experience it all. Thus, the less popular areas of the park are being wasted and not contributing to park earnings as much as they would if people had less wait time at the larger attractions. One way to mitigate for this issue is to have a restriction that a person cannot check-in within a given time. This policy can help balance the number of visitors among attractions.


Figure 2-2. Check-in line chart for all people on Saturday.


               4.   The pavilion and the performance stage for Scott’s shows had only one entrance. Compared to the pavilion that houses valuable memorabilia, the stage area should have multiple entrances/exits connected to the path to alleviate the rush of people caused by a feature show such as Scott’s. Also, the areas would be more accessible if they had been built at the center of the park so people did not create a massive flow down to these regions when they enter the park or when the show starts as shown in Figure 1-2 with thick arrows.

               5.   One of the largest breaks from the norm are the people who check-in only at the entrance to the park. There are between 20 and 30 people throughout the weekend who simply just check-in at the park entrance and then fail to check-in at any rides or attractions as shown in the image below. For example, the IDs listed below are the customers who checked in to the park on Friday, but did not check in any other places. The recommendation would be that the park management should detect anomalous behaviors in order to alert them to potentially suspicious behavior, or just people in need of assistance.


Figure 2-3. Visitor clusters based on sequence of attractions that they check in.


               6.   With our sequence-based clustering technique (see Appendix for details), we identify various types of groups with different patterns in using attractions on different days as shown in Figure 2-3. For instance, the visitors in the largest cluster (6839 visitors) on Sunday prefer thrill rides (green) and kiddie rides (purple). However, individual attractions for thrill rides are spread across the park, which is not convenient for visitors. We recommend placing thrill rides closer to each other, unless their distribution across the park is to mitigate crowds and increase revenue from nearby refreshment stands and souvenir shops.

               7.   The park administrators also need to be cognizant of malicious people trying to disable their tracking devices. We identified one person with ID 392618 who may have tampered with his device. This person suddenly jumped to building 37 around 8:43pm as shown in own in Figure 2-4.


Figure 2-4: Abnormal movement analysis of ID 392618.


 

GC.3 – For the crime, describe the following, and provide your rationale:

a.       When did the crime occur?

b.      Where did the crime take place?

c.       Who are the most likely suspects in the crime?

Limit your response to no more than 5 images and 500 words.

 

 

               a.   Several factors indicate that the crime occurred between 10:00 AM and 11:30 AM on Sunday. The communication time series graph in Figure 3-1 shows the messages sent from the Wet Land area spiked unusually from 11:30 AM-12:00 PM. This is likely because the people checking-in to the pavilion after Scott’s performance found the vandalism and saw that the medal was stolen. Further, the check-ins to the pavilion end at 10:00 AM, when the park locks it for the morning show and then restarts at 11:30 AM. However, the last check-in is right before noon, and then after 12:00 PM, no one is allowed to check-in to the pavilion for the rest of the day. This reinforces the idea that the crime was discovered at 11:30 AM. Thus, we conclude that the crime occurred somewhere within this time frame (10:00 AM-11:30 AM).


Figure 3-1. Communication patterns on Sunday for Wet land.


               b.   From the news report, we learn that the crime occurred at the pavilion (building #32). This is confirmed by observing that the communications from the pavilion spike on Sunday, and that no one is allowed to check-in there the entire afternoon and evening (Figure 3-2).


Figure 3-2: No check-ins are observed in the pavilion after 12:00pm on Sunday.


               c.   Our goal in searching for suspects in the crime was to highlight suspicious behavior of individuals on Sunday. Since we narrowed down the crime time window and the place, we found that three IDs (416790, 461004, and 1502920) spent 2.5 hours in the pavilion, and were in there alone from 10:00 AM-11:30 AM. This time frame matches that of our hypothesis of when the crime occurred. Also, they were the only people in the pavilion between 10:00 AM and 11:00 AM on any other day of the weekend. Furthermore, Figure 3-3 shows that these three suspects checked into areas around Attraction 32 a lot, which indicates that they specifically moved around this area.


Figure 3-3: The trajectory and check-in hotspots of three IDs, 416790, 461004, and 1502920 on Sunday from 8am to 23pm.



Figure 3-4: Linkages between potential suspects and accomplices as derived from communication data.


Additionally, they communicated amongst each other as well during this time period, indicating that they were working together (Figure 3-4). Therefore, we are convinced that these three individuals are the most likely suspects of committing the vandalism and stealing the medal.

We also are considering persons with IDs 1123214, 1350546, 1000279, and 1187909 as possible accomplices. These four people were heavily involved in communications with the suspects, and even checked-in to the same rides throughout the day, with some overlapping times where the suspects may have handed off the medal to them. Note also that these seven individuals also talk to people outside the park, which may further indicate an external mastermind of the crime.

Finally, we note that people with IDs 1711922, 430595, 921888 could be possible accomplices or witnesses. These people stood outside the pavilion for long amounts of time (between 10-30 minutes), and could have either been guarding the entrance or just waiting for the pavilion to be unlocked. Note that IDs 1711922 and 430595 communicate heavily with the park security officials, which may indicate that they are part of the security staff (Figure 3-4). Either way, these three people should also be questioned.

Appendix

Clustering algorithm used for Figure 2-3: We implemented sequence-based clustering to group people based on check-in sequences in categories of attractions. In this approach, we first find the longest common subsequence (LCS) to measure the similarity of at least a two customers sequence. Then, we apply a density based clustering algorithm, DBSCAN to group customers.

Trajectory clustering algorithm (flow visualization with arrows)
We group the individual trajectories into classes of similar sub-trajectories using a trajectory clustering model based on the partition-and-group framework, enabling users to discover common sub-patterns, rather than just seeing common holistic patterns.